Speaker Identification for Swiss German with Spectral and Rhythm Features
نویسندگان
چکیده
We present results of speech rhythm analysis for automatic speaker identification. We expand previous experiments using similar methods for language identification. Features describing the rhythmic properties of salient changes in signal components are extracted and used in an speaker identification task to determine to which extent they are descriptive of speaker variability. We also test the performance of state-of-the-art but simple-to-extract frame-based features. The paper focus is the evaluation on one corpus (swiss german, TEVOID) using support vector machines. Results suggest that the general spectral features can provide very good performance on this dataset, whereas the rhythm features are not as successful in the task, indicating either the lack of suitability for this task or the dataset specificity. DOI: https://doi.org/10.17743/aesconf.2017.978-1-942220-15-2 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-138208 Published Version Originally published at: Lykartsis, Athanasios; Weinzierl, Stefan; Dellwo, Volker (2017). Speaker Identification for Swiss German with Spectral and Rhythm Features. In: 2017 AES International Conference on Semantic Audio (June 2017), Erlangen, June 2017 June 2017. DOI: https://doi.org/10.17743/aesconf.2017.978-1-942220-15-2 Audio Engineering Society Conference Paper Presented at the Conference on Semantic Audio 2017 June 22 – 24, Erlangen, Germany This paper was peer-reviewed as a complete manuscript for presentation at this conference. This paper is available in the AES E-Library (http://www.aes.org/e-lib) all rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. Speaker Identification for Swiss German with Spectral and Rhythm Features Athanasios Lykartsis1, Stefan Weinzierl1, and Volker Dellwo2 1Audio Communication Group, Technische Universität Berlin, Germany 2Phonetics Laboratory, Universität Zürich, Switzerland Correspondence should be addressed to Athanasios Lykartsis ([email protected]) ABSTRACT We present results of speech rhythm analysis for automatic speaker identification. We expand previous experiments using similar methods for language identification. Features describing the rhythmic properties of salient changes in signal components are extracted and used in an speaker identification task to determine to which extent they are descriptive of speaker variability. We also test the performance of state-of-the-art but simple-to-extract frame-based features. The paper focus is the evaluation on one corpus (swiss german, TEVOID) using support vector machines. Results suggest that the general spectral features can provide very good performance on this dataset, whereas the rhythm features are not as successful in the task, indicating either the lack of suitability for this task or the dataset specificity.We present results of speech rhythm analysis for automatic speaker identification. We expand previous experiments using similar methods for language identification. Features describing the rhythmic properties of salient changes in signal components are extracted and used in an speaker identification task to determine to which extent they are descriptive of speaker variability. We also test the performance of state-of-the-art but simple-to-extract frame-based features. The paper focus is the evaluation on one corpus (swiss german, TEVOID) using support vector machines. Results suggest that the general spectral features can provide very good performance on this dataset, whereas the rhythm features are not as successful in the task, indicating either the lack of suitability for this task or the dataset specificity.
منابع مشابه
Rhythmic variability between speakers: articulatory, prosodic, and linguistic factors.
Between-speaker variability of acoustically measurable speech rhythm [%V, ΔV(ln), ΔC(ln), and Δpeak(ln)] was investigated when within-speaker variability of (a) articulation rate and (b) linguistic structural characteristics was introduced. To study (a), 12 speakers of Standard German read seven lexically identical sentences under five different intended tempo conditions (very slow, slow, norma...
متن کاملPerception of levels of emotion in prosody
Prosody conveys information about the emotional state of the speaker. In this study we test whether listeners are able to detect different levels in the emotional state of the speaker based on prosodic features such as intonation, speech rate and intensity. We ran a perception experiment in which we ask Swiss German and Chinese listeners to recognize the intended emotions that the professional ...
متن کاملUsing Exciting and Spectral Envelope Information and Matrix Quantization for Improvement of the Speaker Verification Systems
Speaker verification from talking a few words of sentences has many applications. Many methods as DTW, HMM, VQ and MQ can be used for speaker verification. We applied MQ for its precise, reliable and robust performance with computational simplicity. We also used pitch frequency and log gain contour for further improvement of the system performance.
متن کاملAutomatic Identification of Gender from Speech
Identifying the gender of a speaker from speech has a variety of applications ranging from speech analytics to personalizing human-machine interactions. While gender identification in previous work has explored the use of the statistical properties of the speaker’s pitch features, in this paper, we explore the impact of using spectral features in conjunction with pitch features on identifying g...
متن کاملUsing Exciting and Spectral Envelope Information and Matrix Quantization for Improvement of the Speaker Verification Systems
Speaker verification from talking a few words of sentences has many applications. Many methods as DTW, HMM, VQ and MQ can be used for speaker verification. We applied MQ for its precise, reliable and robust performance with computational simplicity. We also used pitch frequency and log gain contour for further improvement of the system performance.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017